Inference of High-dimensional Autoregressive Generalized Linear Models
نویسندگان
چکیده
Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series evolution. Often these models are used successfully in practice to learn the structure of social, epidemiological, financial, or biological neural networks. However, little is known about statistical guarantees of estimates of such models in non-Gaussian settings. This paper addresses the inference of the autoregressive parameters and associated network structure within a generalized linear model framework that includes Poisson and Bernoulli autoregressive processes. At the heart of this analysis is a sparsity-regularized maximum likelihood estimator. While sparsity-regularization is well-studied in the statistics and machine learning communities, those analysis methods cannot be applied to autoregressive generalized linear models because of the correlations and potential heteroscedasticity inherent in the observations. Sample complexity bounds are derived using a combination of martingale concentration inequalities and modified covering techniques originally proposed for high-dimensional linear regression analysis. These bounds, which are supported by several simulation studies, characterize the impact of various network parameters on estimator performance. 1 Autoregressive Processes in High Dimensions Imagine recording the times at which each neuron in a biological neural network fires or “spikes”. Neuron spikes can trigger or inhibit spikes in neighboring neurons [1, 2, 3, 4, 5, 6, 7], and understanding excitation and inhibition among neurons provides key insight into the structure and operation of the underlying neural network. A central question in the design of this experiment is “for how long must I collect data before I can be confident that my inference of the network is accurate?” Clearly the answer to this question will depend not only on the number of neurons being recorded, but also on what we may assume a priori about the network. Unfortunately, existing statistical and machine learning theory give little insight into this problem. Neural spike recordings are just one example of a non-Gaussian, high-dimensional autoregressive processes, where the autoregressive parameters correspond to the structure of the underlying network. This paper examines a broad class of such processes, in which each observation vector is modeled using an exponential family distribution. In general, autoregressive models are a widely-used mechanism for studying time series in which each observation depends on the past sequence of observations. Inferring these dependencies is a key challenge in many settings, including finance, neuroscience, epidemiology, and sociology. A precise understanding of these dependencies facilitates more accurate predictions and interpretable models of the forces that determine the distribution of each new observation. ∗E. C. Hall is with the Wisconsin Institute of Discovery, University of Wisconsin-Madison, Madison, WI, 53706, USA. e-mail: [email protected] †G. Raskutti is with the Department of Statistics, University of Wisconsin-Madison, Madison, WI, 53706, USA. e-mail: [email protected] ‡R. M. Willett is with the Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA. e-mail: [email protected]. We gratefully acknowledge the support of the awards NSF CCF-1418976, NIH 1 U54 AI117924-01, 14AFOSR-1103, and NSF DMS-1407028. 1 ar X iv :1 60 5. 02 69 3v 1 [ st at .M L ] 9 M ay 2 01 6 Much of the autoregressive modeling literature focuses on Gaussian noise and perturbation models, but in many settings Gaussian noise fails to capture the data at hand. This challenge arises, for instance, when observations correspond to count data – e.g., when we collect data by counting individual events such as neurons spiking. Another example arises in epidemiology, where a common model involves infection traveling stochastically from one node in a network to another based on the underlying network structure in a process known as an “epidemic cascade” [8, 9, 10, 11]. These models are used to infer network structure based on the observations of infection time, which is closely related to the Bernoulli autoregressive model studied in this paper. Further examples arise in a variety of applications, including vehicular traffic analysis [12, 13], finance [14, 15, 16, 17], social network analysis [18, 19, 20, 21, 22], biological neural networks [1, 2, 3, 4, 5, 6, 7], power systems analysis [23], and seismology [24, 25]. Because of their prevalence across application domains, time series count data (cf. [26, 27, 28, 29, 30]) and other non-Gaussian autoregressive processes (cf. [31, 32, 33]) have been studied for decades. Although a substantial fraction of the this literature is focused on univariate time series, this paper focuses on multivariate settings, particularly where the vector observed at each time is high-dimensional relative to the duration of the time series. In the above examples, the dimension of the each observation vector would be the number of neurons in a neural network, the number of people in a social network, or the number of interacting financial instruments. In this paper, we conduct a detailed investigation of a particular family of time series that we call the vector generalized linear autoregressive (GLAR) model. In addition, we examine our results for two members of this family: the Bernoulli autoregressive and the log-linear Poisson autoregressive (PAR) model. The PAR model has been explicitly studied in [34, 35, 36] and is closely related to the continuous-time Hawkes point process model [37, 38, 39, 40] and the discrete-time INGARCH model [41, 42, 43, 44]. However, that literature does not contain the sample complexity results presented here. This paper focuses on estimating the parameters of a vector GLAR model from a time series of observations. We adopt a regularized likelihood estimation approach that extends and generalizes our previous work on Poisson inverse problems (cf. [45, 46, 47, 48]). While similar algorithms have been proposed in the above-mentioned literature, little is known about their sample complexity or how inference accuracy scales with the key parameters such as the size of the network or number of entities observed, the time spent collecting observations, and the density of edges within the network or dependencies among entities. There has been a large body of work providing theoretical results for certain high-dimensional models under lowdimensional structural constraints (see e.g., [49, 48, 50, 51, 52, 53, 54, 55]). The majority of prior work has focused on the setting where samples are independent and/or follow a Gaussian distribution. In the GLAR setting, however, non-Gaussianity and temporal dependence among observations can make such analyses particularly challenging and beyond the scope of much current research in high-dimensional statistical inference (see [56] for an overview). Perhaps the most closely related prior work to our setting in the high-dimensional setting is [57]. In [57], several performance guarantees are provided for different linear Gaussian problems with dependent samples including the Gaussian autoregressive model. Since [57] deals exclusively with linear Gaussian models, they exploit many properties of linear systems and Gaussian random variables that cannot be applied to non-Gaussian and non-linear autoregressive models. In particular, compared to standard autoregressive processes with Gaussian noise, in the GLAR setting the conditional variance of each observation is dependent on previous data instead of being a constant equal to the noise variance. Works such as [48, 49, 58] provide results for non-Gaussian models but still rely on independent observations. Weighted LASSO estimators for Hawkes processes address some of these challenges in a continuoustime setting [40]. To see why GLAR analysis can be challenging, consider momentarily a LASSO estimator of the autoregressive parameters. In the classical LASSO setting, the accuracy of the estimate depends on characteristics of the Gram matrix associated with the design or sensing matrix. This matrix may be stochastic, but it is usually considered independent of the observations and performance guarantees for the estimator depend on the assumption that the matrix obeys certain properties (e.g., the restricted eigenvalue condition [59]). In our setting, however, the “design” matrix is a function of the observed data, which in turn depends on the true underlying network or autoregressive model parameters. Thus a key challenge in the analysis of a LASSO-like estimator in the GLAR setting involves showing that the dataand network-dependent Gram matrix exhibits properties that ensure reliable estimates. In this paper, we develop performance guarantees for the vector GLAR model that provide sample complexity guarantees in the high-dimensional setting under low-dimensional structural assumptions such as sparsity of the un-
منابع مشابه
Count Time Series Models
We review regression models for count time series. We discuss the approach which is based on generalized linear models and the class of integer autoregressive processes. The generalized linear models framework provides convenient tools for implementing model fitting and prediction using standard software. Furthermore, this approach provides a natural extension to the traditional ARMA methodolog...
متن کاملBayesian Inference for Spatial Beta Generalized Linear Mixed Models
In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...
متن کاملModelling Exponential Family Time Series Data
In this paper we have proposed a class of Generalized Autoregressive Moving Average (GARMA) models which extend univariate ARMA models to a non-Gaussian situation (i.e. they extend the univariate Generalized Linear Model to incorporate time dependence in the observations). The simplicity of the tting algorithm within the iteratively reweighted least squares (IRLS) framework will be shown. Model...
متن کاملStatistical Inference in Autoregressive Models with Non-negative Residuals
Normal residual is one of the usual assumptions of autoregressive models but in practice sometimes we are faced with non-negative residuals case. In this paper we consider some autoregressive models with non-negative residuals as competing models and we have derived the maximum likelihood estimators of parameters based on the modified approach and EM algorithm for the competing models. Also,...
متن کاملComparison of autoregressive integrated moving average (ARIMA) model and adaptive neuro-fuzzy inference system (ANFIS) model
Proper models for prediction of time series data can be an advantage in making important decisions. In this study, we tried with the comparison between one of the most useful classic models of economic evaluation, auto-regressive integrated moving average model and one of the most useful artificial intelligence models, adaptive neuro-fuzzy inference system (ANFIS), investigate modeling procedur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1605.02693 شماره
صفحات -
تاریخ انتشار 2016